The IFCASL Corpus of French and German Non-native and Native Read Speech
نویسندگان
چکیده
The IFCASL corpus is a French-German bilingual phonetic learner corpus designed, recorded and annotated in a project on individualized feedback in computer-assisted spoken language learning. The motivation for setting up this corpus was that there is no phonetically annotated and segmented corpus for this language pair of comparable of size and coverage. In contrast to most learner corpora, the IFCASL corpus incorporate data for a language pair in both directions, i.e. in our case French learners of German, and German learners of French. In addition, the corpus is complemented by two sub-corpora of native speech by the same speakers. The corpus provides spoken data by about 100 speakers with comparable productions, annotated and segmented on the word and the phone level, with more than 50% manually corrected data. The paper reports on inter-annotator agreement and the optimization of the acoustic models for forced speech-text alignment in exercises for computer-assisted pronunciation training. Example studies based on the corpus data with a phonetic focus include topics such as the realization of /h/ and glottal stop, final devoicing of obstruents, vowel quantity and quality, pitch range, and tempo.
منابع مشابه
Inter-annotator agreement for a speech corpus pronounced by French and German language learners
This paper presents the results of an investigation of interannotator agreement for the non-native and native French part of the IFCASL corpus. This large bilingual speech corpus for French and German language learners was manually annotated by several annotators. This manual annotation is the starting point which will be used both to improve the automatic segmentation algorithms and derive dia...
متن کاملAttractiveness of French Voices for German Listeners - Results from Native and Non-Native Read Speech
This study investigated how the perceived attractiveness of voices was influenced by a foreign language, a foreign accent, and the level of fluency in the foreign language. Stimuli were taken from a French-German corpus of read speech with German native speakers as raters. Additional factors were stimulus length (syllable or entire sentence) and sex (of the raters and speakers). Results with Ge...
متن کاملDetection of phone boundaries for non-native speech using French-German models
Within the framework of computer assisted foreign language learning for the French/German pair, we evaluate different HMM phone models for detecting accurate phone boundaries. The optimal parameters are determined by minimizing on the non-native speech corpus the number of phones whose boundaries are shifted by more than 20 ms compared to the manual boundaries. We observe that the best performa...
متن کاملDesigning a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process
We present the design of a corpus of native and non-native speech for the language pair French-German, with a special emphasis on phonetic and prosodic aspects. To our knowledge there is no suitable corpus, in terms of size and coverage, currently available for the target language pair. To select the target L1-L2 interference phenomena we prepare a small preliminary corpus (corpus1), which is a...
متن کاملSpeech-like Pragmatic Markers in Argumentative Essays Written by Iranian EFL Students and Native English Speaking Students
In this study, the use of speech-like pragmatic markers in Iranian EFL students’ academic writing was investigated. Speech-like pragmatic markers, such as I think, well, I guess, actually, anyway, anyhow, etc. are linguistic components that are more specific to conversation than writing, and writers may wrongly include them in their academic writing. To examine the students’ use of speech-like ...
متن کامل